Ease of Use with Concurrent Collections (CnC)

نویسنده

  • Kathleen Knobe
چکیده

Parallel programming is hard. We present a new approach called Concurrent Collections (CnC). This paper briefly explains why writing a parallel program is hard in the current environment and introduces our new approach based on this perspective. In particular, a CnC program doesn’t explicitly express the parallelism. It expresses the constraints on parallelism. These constraints remain valid regardless of the target architecture. 1. Why is parallel programming hard? Many parallel languages embed parallel language constructs within the text of the serial code. Examples include MPI, OpenMP, PThreads, Ct etc. This embedding is the source of some unnecessary difficulties: • Serial code requires a serial ordering. If there is no semantically required ordering among some blocks of code, an arbitrary ordering must be specified. • Serial code modifies and refers to variables (locations), not values. Variables can be overwritten. This overwriting over-constrains the possible orderings. • Serial code tightly couples the question of if we will execute code from when we will execute it. Arriving at some point in the control flow indicates both that, yes, we will execute this code and also that we will execute it now. This is true for loop iterations, recursive calls and invocations of other subroutines. These also constitute arbitrary ordering. Converting serial code to parallel code involves uncovering alternate valid executions either by manually or automatically. In the presence of arbitrary ordering, this process requires a complex analysis (human or machine). Embedding parallel language constructs or pragmas in the midst of this problem again requires uncovering alternate valid executions. This is difficult to get right in the first place and to modify later. In addition, of course, the parallelism constructs might • be for a constrained class or architectures (say, shared memory) • focus on a limited type of parallelism (say, data parallelism) So when the architecture changes, so must the code. For these reasons, embedding parallelism in serial code 1 It not hard to find an ordering but it can be complicated for a program or a compiler to undo the ordering. can limit both the language’s effectiveness and its ease of use. In addition, these constraints might assume arbitrary constraints such as barriers after each loop or single-program-multiple-data (SPMD). Although this is not the focus of this paper, notice that these assumptions can also inhibit performance. 2. The essence of parallel execution What does a runtime system need to know in order to execute a program in parallel? We are not yet asking how to specify the parallelism, how to optimize for any specific target, etc. We are just asking: What are the inputs to this decision? We need to identify the semantically required scheduling constraints. These are: • Data dependences (producer/consumer relations): One computation produces data consumed by another. Data is explicitly produced by a producer computation and explicitly consumed by (possibly multiple) consumer computations. • Control dependences (controller/controllee relations): One computation determines if another will execute. To eliminate the tight coupling of the if and when control flow questions, control tags will be explicitly produced by a controller computation and will control the execution of a controllee computation. This puts the control and data dependences on the same level as in intermediate forms such as program dependence graphs [5]. The types of objects that need to be identified are: • The computations, i.e., the high-level operations, in the application. • The data structures that participate in data dependences among these high-level operations. • The control tags that participate in control dependences among these high-level operations. The relationships among these objects that need to be identified are: • producer/consumer relations • controller/controllee relations As we see below, these three types of objects and relations among them is exactly what Concurrent Collections provides. 3. What is Concurrent Collections? CnC relies on a combination of ideas from tuple-space [6], streaming [7] and dataflow [8] languages. CnC programs are written in terms of high-level applicationspecific operations. These operations are partially ordered according to their semantically required scheduling constraints only. The data that flows among these operations is by value, not by location. There is no overwriting and no arbitrary serialization among the high-level operations. The high-level operations themselves are implemented in a serial language. This approach supports an important separation of concerns. There are two roles involved in implementing a parallel program. One is the domain expert, the developer whose interest and expertise is in the application domain, e.g., finance, genomics, etc. The other is the tuning expert, whose interest and expertise is in performance, possibly performance on a particular platform. These may be distinct individuals or the same individual at different stages in application development. The tuning expert may in fact be software (static compiler analysis or dynamic runtime analysis). The Concurrent Collections programming model separates the expression of the semantics of the computation (by the domain expert) from the expression of the actual parallelism, scheduling and distribution for a specific architecture (by the tuning expert). This separation simplifies the work of the domain expert. Writing in this language does not require any reasoning about parallelism or any understanding of the target architecture. The domain expert is concerned only with her area of expertise (the semantics of the application). The tuning expert is given the maximum possible freedom to map the computation onto his target architecture. In this document we will focus on topics relevant to the domain expert. 3.1. Language concepts via an example We will use face detection as an example application to describe the language. Detection is performed on a sequence of images. Each image is further subdivided into square sub-images (called windows) of any size and at any position within the image. Each window is processed by a sequence of classifiers. If any classifier in the sequence fails, the window does not contain a face and the remainder of the classifiers need not process that window. The goal of this approach is to rapidly reject any window not containing a face. This example is chosen because it relies heavily on controldependences which enable Concurrent Collections to support more than pure streaming applications. A program is specified by a graph with three types of nodes (computation steps, data items and control tags) and three types of edges (producer relations, consumer relations and prescription relations). We will introduce the language by showing the process one might go through to create a version of the face detector in this language. This discussion refers to Figure 3-1 which shows a simplified graphical representation of our face detection application. Figure 3-1 Face detection: graphical form 3.1.1. Creating a CnC graph 1. What are the high-level operations in the application? The computation is partitioned into high-level operations called step collections. Step collections are represented as ovals. In this application, the step collections are the classifiers C1, ..., Cn. We use the term step collection to indicate that it is a collection composed of distinct step instances which are the unit of scheduling, distribution and execution. 2. What data is produced/consumed by these operations? Similarly, the user data is partitioned into data structures called item collections. Item collections are represented by rectangles. Again we use the term collection to indicate that it is a collection composed of distinct item instances. In this application there is only one item collection, image. Item instances are the units of storage, communication and synchronization. The producer and consumer relationships between step collections and item collections are explicit. The consumer relationships are represented as directed edges into steps. Producer relations are represented as directed edges out from steps. The image items are consumed by the classifier steps. There are no items produced in this application. The environment (the code that invokes the graph) may produce and consume items and tags. These relationships are represented by directed squiggly edges. In our application, for example, the environment produces Image items. At this point we have a description that is typical of how people communicate informally with one another at a whiteboard. The next two phases are required to make this informal description precise enough to execute. 3. What distinguishes instances of data and operations? The computations represented by circles are not longlived computations that continually consume input and produce output. This would constitute another arbitrary ordering. Instead, scheduling and distribution is on step instances. Synchronization and communication is on item instances. We need to distinguish among the instances of a step collection and instances of an item collection. Each dynamic instance of a step or an item is uniquely identified by an application-specific tag. A tag component might indicate a node identifier in a graph, a row number in an array, an employee number, a year, etc. A complete tag might be composed of several components, for example, employee number and year or maybe xAxis, yAxis, and zAzis. In our example, the instances of the image collection are distinguished by image#. The classifier step instances are distinguished by image# and window# pair. In this example, a classifier step inputs the whole image even though it operates only on one window within the image. 4. What are the actual instances of data and operations? Knowing the tag components that allow us to distinguish among instances is not quite precise enough to execute. Knowing that we distinguish instances of classifier1 by values of image# and window# doesn’t tell us if classifier1 is to be executed for image#2873, window#56. We have already introduced item collections 2 The directed edges from steps to the triangles are discussed below. for data and step collections for computation. Now we introduce tag collections for control to specify exactly which instances will execute. Tag collections, sets of tag instances, provide the control mechanism. Tag collections are shown in triangles. The tag collections in this graph are T1, ... , Tn . A prescriptive relation may exist between a tag collection T and a step collection S. The meaning of such a relationship is this: if a tag instance t, say image# 2873, window# 56, is in collection T, then the step instance s in S with tag value t, image# 2873, window# 56, will execute. A prescriptive relation is shown as a dotted edge between a tag collection and a step collection. A step collection S prescribed by a tag collection T must have tags of the same form as tags in T. Thus we know the form of the tags for the classifiers. Usually control flow indicates not only if code executes but also when. In CnC, the control via tags only indicates if code executes. When it executes is up to a subsequent scheduler. When we add a tag collection to our specification, we have to add the corresponding producer relation. For example, the environment produces T1 which indicates all the windows for all the images. The point of step collection C1 is to determine which of these windows definitely do not contain a face and which might contain a face. An instance of C1, say with tag image# i and window# w, will either produce tag T2 with tag image# i and window# w (indicating that it might be a face) or it will produce nothing (indicating that it is definitely not a face). So step collection C1 produces tag collection T2. The tag instances in T2 determine which instances of C2 will execute. Similarly other step collections and tag collections have producer relationships. 5. What are the relationships among instances? To understand the constraints on parallelism we need more specifics about the relations among instances. Tag functions provide this information. In our example, the producer tag function that maps the tag of a classifier step, say (C1) to the tag of the tag collection is the identity function, e.g., (C1: i, w) can only produce not for example. Other applications, for example nearest neighbor computations, have more interesting tag functions. What is important is that tag functions require only domain knowledge, not understanding of parallelism. At this point the importance of tag collections and tag instances should be clear. Tags make this language more flexible and more general than a streaming language. In addition, the tag mechanism separates the question of if a step will execute from when a step will execute. The domain-expert determines if a step will execute. The tuning-expert determines when it will execute. This separation not only allows for more effective tuning, it makes the job of the domain expert easier. 3.1.2. Textual representation Concurrent Collections can be represented in a variety of distinct forms. A textual form of the graph represents each relationship in a separate statement using parens, square brackets and angle brackets instead of ovals, rectangles and triangles. For example, (C1: image#, window#) • ; A translator converts this form to use a CnC class library. One can specify the graph directly in the CnC class library. We are currently investigating a graphical form that looks more like Figure 3-1. 3.1.3. Coding the high-level operators In addition to specifying the graph, we need to code the steps in a serial language. The step has access to the values of its tag components. It uses get operations to consume items and put operations to produce items and tags. An example of step code showing the API for the current implementation is shown below. void c1(facedetector_graph_t& graph, const Tag_t& step_tag) { // Retrieve the image image_t x = graph.image.Get(step_tag);

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Concurrent Collections Programming Model

We introduce the Concurrent Collections (CnC) programming model. In this model, programs are written in terms of high-level operations. These operations are partially ordered according to only their semantic constraints. These partial orderings correspond to data dependences and control dependences. The role of the domain expert, whose interest and expertise is in the application domain, and th...

متن کامل

Concurrent Collections (CnC)

CnC is a system for describing the structure of parallel computation, or coordinating the dataand control-flow between the individual steps of a computation [2, 3]. A CnC application specifies a set of discrete step functions, and the data collections used as input to and output from those step functions.1 The CnC coordination language describes the relationship between a specific invocation of...

متن کامل

Concurrent Collections

We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees dete...

متن کامل

Cluster Computing using Intel Concurrent Collections

The Intel Corporation is developing a new parallel software and compiler called Concurrent Collections (CnC) to make programming in parallel easier for the user. CnC provides a system of collections comprised of steps, items, and tags. A CnC user specifies their algorithm in a graph representation using these constructs. Using this graph of dependencies, CnC automatically identifies paralleliza...

متن کامل

A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Concurrent Collections

We present a programming methodology and runtime performance case study comparing the declarative data flow coordination language S-NET with Intel’s Concurrent Collections (CnC). As a coordination language S-NET achieves a near-complete separation of concerns between sequential software components implemented in a separate algorithmic language and their parallel orchestration in an asynchronous...

متن کامل

TITLE The Concurrent Collections Programming Model BYLINE

Concurrent Collections (CnC) is a parallel programming model, with an execution semantics that is influenced by dynamic dataflow, stream-processing, and tuple spaces. The three main constructs in the CnC programming model are step collections, data collections, and control collections. A step collection corresponds to a computation, and its instances correspond to invocations of that computatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009